0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag
Qwen2.5-VL-3B-Instruct is the latest addition to the Qwen family of vision-language models by Hugging Face, featuring enhanced capabilities in understanding visual content and generating structured outputs. It is designed to directly interact with tools and use computer and phone functions as a visual agent. Qwen2.5-VL can comprehend videos up to an hour long and localize objects within images using bounding boxes or points. It is available in three sizes: 3, 7, and 72 billion parameters.
Hugging Face researchers developed an open-source AI research agent called 'Open Deep Research' in 24 hours, aiming to match OpenAI's Deep Research. The project demonstrates the potential of agent frameworks to enhance AI model capabilities, achieving 55.15% accuracy on the GAIA benchmark. The initiative highlights the rapid development and collaborative nature of open-source AI projects.
Hugging Face's initiative to replicate DeepSeek-R1, focusing on developing datasets and sharing training pipelines for reasoning models.
The article introduces Hugging Face's Open-R1 project, a community-driven initiative to reconstruct and expand upon DeepSeek-R1, a cutting-edge reasoning language model. DeepSeek-R1, which emerged as a significant breakthrough, utilizes pure reinforcement learning to enhance a base model's reasoning capabilities without human supervision. However, DeepSeek did not release the datasets, training code, or detailed hyperparameters used to create the model, leaving key aspects of its development opaque.
The Open-R1 project aims to address these gaps by systematically replicating and improving upon DeepSeek-R1's methodology. The initiative involves three main steps:
Alibaba's Qwen 2.5 LLM now supports input token limits up to 1 million using Dual Chunk Attention. Two models are released on Hugging Face, requiring significant VRAM for full capacity. Challenges in deployment with quantized GGUF versions and system resource constraints are discussed.
smolagents is a simple library that enables agentic capabilities for language models, allowing them to interact with external tools and perform tasks based on real-world data.
Hugging Face's SmolAgents simplifies the creation of intelligent agents by allowing developers to build them with just a few lines of code using powerful pretrained models.
HunyuanVideo is an open-source video generation model that showcases performance comparable to or superior to leading closed-source models. It includes features like a unified image and video generative architecture, a large language model text encoder, and a causal 3D VAE for spatial-temporal compression.
NuExtract is a fine-tuned version of phi-3-mini for information extraction. It requires a JSON template describing the information to extract and an input text. Provides tiny (0.5B) and large (7B) versions.
Hugging Face introduces a unified tool use API across multiple model families, making it easier to implement tool use in language models.
Hugging Face has extended chat templates to support tools, offering a unified approach to tool use with the following features:
DavidAU's model collection on Hugging Face includes various AI and ML models, such as GALAXY-XB, Mini-MOEs, TinyLlama, and Psyonic-Cetacean. These models are designed for text generation, single/multiple LLMs, and automation tasks.
First / Previous / Next / Last
/ Page 1 of 0